Semantic Change Detection With Gaussian Word Embeddings
نویسندگان
چکیده
Diachronic study of the evolution languages is importance in natural language processing (NLP). Recent years have witnessed a surge computational approaches for detection and characterization lexical semantic change (LSC) due to availability diachronic corpora advancing word representation techniques. We propose Gaussian embedding (w2g)-based method present comprehensive LSC detection. W2g probabilistic distribution-based model represents words as mixture models using covariance information along with existing mean (word vector). also extensively several aspects w2g-based under SemEval-2020 Task 1 evaluation framework well Google N-gram corpus. In Sub-task (LSC binary classification) 1, we report highest overall ranking ranks two (German Swedish) four (English, Swedish, German Latin). Spearman correlation 2 ranking) Swedish. Our rankings classification sub-tasks are $^{\rm {st}}$ 7 {th}}$ , respectively. Qualitative analysis has been presented.
منابع مشابه
Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change
Understanding how words change their meanings over time is key to models of language and cultural evolution, but historical data on meaning is scarce, making theories hard to develop and test. Word embeddings show promise as a diachronic tool, but have not been carefully evaluated. We develop a robust methodology for quantifying semantic change by evaluating word embeddings (PPMI, SVD, word2vec...
متن کاملAutoExtend: Combining Word Embeddings with Semantic Resources
We present AutoExtend, a system that combines word embeddings with semantic resources by learning embeddings for non-word objects like synsets and entities and learning word embeddings which incorporate the semantic information from the resource. The method is based on encoding and decoding the word embeddings and is flexible in that it can take any word embeddings as input and does not need an...
متن کاملAdjusting Word Embeddings with Semantic Intensity Orders
Semantic lexicons such as WordNet and PPDB have been used to improve the vector-based semantic representations of words by adjusting the word vectors. However, such lexicons lack semantic intensity information, inhibiting adjustment of vector spaces to better represent semantic intensity scales. In this work, we adjust word vectors using the semantic intensity information in addition to synonym...
متن کاملGaussian Mixture Embeddings for Multiple Word Prototypes
Recently, word representation has been increasingly focused on for its excellent properties in representing the word semantics. Previous works mainly suffer from the problem of polysemy phenomenon. To address this problem, most of previous models represent words as multiple distributed vectors. However, it cannot reflect the rich relations between words by representing words as points in the em...
متن کاملGaussian LDA for Topic Models with Word Embeddings
Continuous space word embeddings learned from large, unstructured corpora have been shown to be effective at capturing semantic regularities in language. In this paper we replace LDA’s parameterization of “topics” as categorical distributions over opaque word types with multivariate Gaussian distributions on the embedding space. This encourages the model to group words that are a priori known t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing
سال: 2021
ISSN: ['2329-9304', '2329-9290']
DOI: https://doi.org/10.1109/taslp.2021.3120645